Efficient No-Regret Multiagent Learning
نویسندگان
چکیده
We present new results on the efficiency of no-regret algorithms in the context of multiagent learning. We use a known approach to augment a large class of no-regret algorithms to allow stochastic sampling of actions and observation of scalar reward of only the action played. We show that the average actual payoffs of the resulting learner gets (1) close to the best response against (eventually) stationary opponents, (2) close to the asymptotic optimal payoff against opponents that play a converging sequence of policies, and (3) close to at least a dynamic variant of minimax payoff against arbitrary opponents, with a high probability in polynomial time. In addition the polynomial bounds are shown to be significantly better than previously known bounds. Furthermore, we do not need to assume that the learner knows the game matrices and can observe the opponents’ actions, unlike previous work.
منابع مشابه
Convergence and No-Regret in Multiagent Learning
Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be able to exploit a learner’s particular dynamics. In the worst case, this could result in poorer ...
متن کاملEmpirically Evaluating Multiagent Reinforcement Learning Algorithms
This article makes two contributions. First, we present a platform for running and analyzing multiagent reinforcement learning experiments. Second, to demonstrate this platform we undertook and evaluated an empirical test of multiagent reinforcement learning algorithms from the literature, which to our knowledge is the largest such test ever conducted. We summarize some conclusions from our exp...
متن کاملUnifying Convergence and No-Regret in Multiagent Learning
We present a new multiagent learning algorithm, RVσ(t), that builds on an earlier version, ReDVaLeR . ReDVaLeR could guarantee (a) convergence to best response against stationary opponents and either (b) constant bounded regret against arbitrary opponents, or (c) convergence to Nash equilibrium policies in self-play. But it makes two strong assumptions: (1) that it can distinguish between self-...
متن کاملImproving Convergence Rates in Multiagent Learning Through Experts and Adaptive Consultation
We present a multiagent learning algorithm with guaranteed convergence to Nash equilibria for all games. Our approach is a regret-based learning algorithm which combines a greedy random sampling method with consultation of experts that suggest possible strategy profiles. More importantly, by consulting carefully chosen experts we can greatly improve the convergence rate to Nash equilibria, but ...
متن کاملNo-regret learning and a mechanism for distributed multiagent planning
We develop a novel mechanism for coordinated, distributed multiagent planning. We consider problems stated as a collection of single-agent planning problems coupled by common soft constraints on resource consumption. (Resources may be real or fictitious, the latter introduced as a tool for factoring the problem). A key idea is to recast the distributed planning problem as learning in a repeated...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005